Viviana Lara, Justin Cline
In the realm of fantasy literature and cinema, few franchises have captured the imagination quite like Harry Potter. But beyond the spellbinding narratives lies a wealth of data waiting to be explored. Our analysis aims to uncover the hidden patterns and insights within the Harry Potter movie series, using a comprehensive dataset that spans characters, dialogue, spells, and movie statistics.
The central question we seek to answer is: How do the various elements of the Harry Potter universe interact and evolve throughout the series? To address this, we’ll be diving into multiple CSV files, including:
By analyzing this rich dataset, we aim to reveal trends in character development, explore the complexity of magic over time, and even draw connections between the fictional world and real-world factors like movie budgets and audience reception. Whether you’re a die-hard fan or a data enthusiast, this analysis promises to shed new light on the intricate tapestry of the wizarding world, all through the lens of data science.
The datasets used in this analysis come from two main sources (Kaggle):
Our analysis uses various R packages to process and visualize the data:
Through statistical analysis and data visualization, we explore several key aspects:
This analysis will provide fans, researchers, and storytellers with quantitative insights into the intricate world of Harry Potter, revealing patterns that might not be immediately apparent through casual viewing or reading.
## Rows: 61
## Columns: 5
## $ Spell.ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ Incantation <chr> "Accio", "Aguamenti", "Alarte Ascendare", "Alohomora", "Ar…
## $ Spell.Name <chr> "Summoning Charm", "Water-Making Spell", "Launch an object…
## $ Effect <chr> "Summons an object", "Conjures water", "Rockets target upw…
## $ Light <chr> "", "Icy Blue", "Red", "Blue", "Blue", "", "Green", "", "B…
## Rows: 74
## Columns: 3
## $ Place.ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
## $ Place.Name <chr> "Flourish & Blotts", "Gringotts Wizarding Bank", "Knock…
## $ Place.Category <chr> "Diagon Alley", "Diagon Alley", "Diagon Alley", "Diagon…
## Rows: 7,444
## Columns: 5
## $ Dialogue.ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17…
## $ Chapter.ID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, …
## $ Place.ID <int> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, …
## $ Character.ID <int> 4, 7, 4, 7, 4, 7, 4, 5, 4, 5, 7, 4, 7, 4, 4, 4, 32, 31, 3…
## $ Dialogue <chr> "I should have known that you would be here...Professor M…
This analysis explores the characteristics and demographics of different Hogwarts houses and magical schools using the Characters dataset. We examine several key variables including:
The main houses and schools in the Harry Potter universe include:
In the Harry Potter universe, there are several distinct blood statuses:
A Patronus is a powerful defensive charm in the Harry Potter universe that manifests as a bright, silvery-white guardian or protector taking the form of an animal. This advanced magic serves primarily as protection against dark creatures like Dementors and Lethifolds.
Each witch or wizard’s Patronus typically takes a unique animal form that reflects their personality.
The incantation “Expecto Patronum” is used to conjure a Patronus, but the spell requires both the incantation and intense focus on a powerful, happy memory to be successful.
| Patronus Form | Number of Characters | Percentage |
|---|---|---|
| Non-corporeal | 29 | 51.8 |
| None | 7 | 12.5 |
| Cat | 2 | 3.6 |
| Doe | 2 | 3.6 |
| Stag | 2 | 3.6 |
| Boar | 1 | 1.8 |
| Fox | 1 | 1.8 |
| Goat | 1 | 1.8 |
| Hare | 1 | 1.8 |
| Horse | 1 | 1.8 |
To analyze dialogue patterns across the Harry Potter movie series, we examined the following datasets:
Key variables analyzed include:
|
Word Statistics
|
||
|---|---|---|
| Word | Frequency | Percentage |
| harry | 690 | 22.38 |
| potter | 295 | 9.57 |
| sir | 198 | 6.42 |
| professor | 172 | 5.58 |
| dumbledore | 164 | 5.32 |
| ron | 164 | 5.32 |
| time | 159 | 5.16 |
| hermione | 139 | 4.51 |
| hagrid | 118 | 3.83 |
| yeah | 107 | 3.47 |
| boy | 103 | 3.34 |
| kill | 101 | 3.28 |
| dobby | 94 | 3.05 |
| hogwarts | 91 | 2.95 |
| wand | 84 | 2.72 |
| sirius | 83 | 2.69 |
| bit | 82 | 2.66 |
| voldemort | 82 | 2.66 |
| dark | 80 | 2.59 |
| day | 77 | 2.50 |
These locations serve as essential backdrops for character interactions and plot development throughout the series, each with its own unique atmosphere and significance to the story.
| House | Total Lines of Dialogue | Percentage of All Dialogue |
|---|---|---|
| Gryffindor | 5394 | 81.3 |
| Slytherin | 883 | 13.3 |
| Ravenclaw | 239 | 3.6 |
| Hufflepuff | 73 | 1.1 |
| Beauxbatons | 28 | 0.4 |
| Durmstrang | 20 | 0.3 |
The Golden Trio: Harry Potter, Ron Weasley, and Hermione Granger - form the central characters of the series.
Their dialogue distribution across the movies reveals interesting patterns in character development and story focus:
Several key observations from the dialogue analysis:
To find spell usage by each character and house we utilize a function to search through every line of dialogue searching for occurrences of spell incantations and then record the which character used the spell and what spell they used.
To analyze spell usage throughout the Harry Potter movies we examined the following datasets:
Dialogue.csv: Contains each line of dialogue and who said it
Spells.csv: Contains the spell incantations
Key Variables include:
Individual dialogue lines and the incantations they contain
Character names
House names
Incantations (e.g. Sectum Sempra, Expelliarmus)
The SpellFinder function identifies spell incantations in dialogue by:
The analysis tracks how these spells are used across different characters, chapters, and contexts throughout the series.
SpellFinder <- function(dialogueDF, spellsDF) {
#Initializes up the matches data frame to hold the
#line of dialogue and spell being used in that line of dialogue.
Matches <- data.frame(
Dialogue = character(),
Spell = character()
)
#for each spell in the Spells data frame we identify which lines of
#dialogue contain that spell and add those lines of dialogue to the MatchingLines vector.
for (spell in spellsDF$Incantation) {
MatchingLines <- dialogueDF$Dialogue[str_detect(
dialogueDF$Dialogue,spell)]
#Stores the vector MatchingLines in a temporary data
#frame and Labels each dialogue with the spell found in them from this iteration of the loop.
MatchesTemp <- data.frame(
Dialogue = MatchingLines,
Spell = rep(spell, length(MatchingLines))
)
#Once we have a temporary data frame with each of the
#lines of dialogue and have labeled them by the
#spell used we can add the dialogues with this
#iterations spells to a final data frame which will hold
#all lines of dialogue with a spell from all iterations at the end of the loop
Matches <- rbind(Matches,MatchesTemp)
}
return(Matches)
}To analyze wand characteristics and their distribution across the Harry Potter universe, we examined the following datasets:
Key variables analyzed include:
This analysis explored the Harry Potter universe through multiple datasets examining wand characteristics, spell usage patterns, and characters across houses and throughout the series. The primary goal was to uncover patterns and relationships in magical elements across different demographic groups.
Key findings include:
Implications:
Limitations: